United States Patent [i9] 

Snow et al. 



[11] 

[45] 



US00605554OA 
Patent Number: 
Date of Patent: 



6,055,540 
Apr. 25, 2000 



[54] METHOD AND APPARATUS FOR CREATING 
A CATEGORY HIERARCHY FOR 
CLASSIFICATION OF DOCUMENTS 

[75] Inventors: William A. Snow, Rech\'ood City; 

Joseph D. Mocker, Cupertino, both of 
Calif. 

[73] Assignee: Sun Microsystems, Inc. 

[21] Appl. No.: 08/874,567 
[22] FUed: Jun. 13, 1997 

[51] Int,Cl7 G06F 17/30 

[52] U.S. CI 707A03; 707/100; 707/3 

[58] Field of Search 707/100, 102, 

707/103, 104, 3 

[56] References Cited 

U.S. PATENT DOCUMENTS 

5,162,992 11/1992 WUliams 364/419 

5,297,249 3/1994 Bernstein el a! 345/356 

5,301,319 4/1994 Thruman et al 707/103 

5,317,646 5/1994 Sang et al 382/175 

5,333,237 7/1994 Stefanopoulos et al 706/11 

5,355,472 10/1994 Lewis 707/101 

5,418,946 5/1995 Mori 395/600 

5,442,778 8/1995 Pedersen et al 395/600 

5,463,773 10/1995 Sakakibara et al 707/102 

5,568,640 10/1996 Nishiyama et al 395/600 

5,634,051 5/1997 Thomson 395/605 

5,649,186 7/1997 Ferguson 395/610 



5,706,496 1/1998 Noguchi et al 395/603 

5,721,910 2/1998 Unger et al 707/100 

5,768,578 6/1998 Kirk et al 395/611 

5,778,362 7/1998 Decrwester 707/5 

5,778,372 7/1998 Cordell et al 707/100 

5,781,914 7/1998 Stork et al 707/506 

5,787,425 7/1998 Bigus 707/6 

5,802,518 9/1998 Karaev el al 707/9 

5,806,068 9/1998 Shaw et al 707/103 

5,809,340 9/1998 Bertone et al 395/878 

5,812,995 9A998 Sasaki et al 707/1 

5,813,014 9/1998 Gustman 707/103 

5,835,712 11/1998 DuFresne 709/2G3 

5,838,965 11/1998 Kavanaugh et al 707/103 

5,862,325 1/1999 Reed et al 709/201 

Primary Examiner— ^omd^s G. Black 
Assistant Examiner — Charles L. Rones 
Attorney, Agent, or Firm — D'Alessandro & Ritchie 

[57] ABSTRACT 

A method for creating a class hierarchy containing catego- 
ries for classification of documents. TTie class hierarchy is 
initiahzed to contain a root category node within a tree data 
structure. The root category node is defined by a user- 
defined category name. The class hierarchy is displayed to 
assist a user in entering a command for manipulating the 
class hierarchy. A user may select a category command, 
resulting in the class hierarchy containing a plurality of 
category nodes. In addition, a user may select a terms 
command to manipulate terms defining one of the plurality 
of category nodes. 
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METHOD AND APPARATUS FOR CREATING 
A CATEGORY HIERARCHY FOR 
CLASSIFICATION OF DOCUMENTS 

BACKGROUND OF THE INVENTION 

1. Field of ihe InventioD 

The embodiments of the present invention relate to a 
method and apparatus for creating a class hierarchy for 
classification of documents. More particularly, the embodi- 
ments of the present invention relate to a method and 
apparatus for creating document categories within a class 
hierarchy and creating category definitions defining the 
document categories to allow classification of documents 
within the document categories. 

2. Background of the Invention 

Accurate classification of information is typically accom- 
plished through the use of various classes and corresponding 
criteria for classification. Moreover, various types of hier- 
archies may be used for information storage. For example, 
systems using tree data structures are commonly used to 
store related information. 

Retrieval of documents from such a system can be per- 
formed most eflScienlly when the documents are properly 
classified. A need exists in the prior art for a method and 
apparatus for creating a class hierarchy containing classes 
and criteria for classification. Implementing such a class 
hierarchy in a computerized document classification system 
would provide for eflScient and accurate categorization of 
documents. Moreover, it would be extremely beneficial if 
such a system were made available to multiple users via a 
communications network such as a computer network. * 

BRIEF DESCRIPTION OF THE INVENTION 

A method and apparatus for creating a class hierarchy 
containing categories for classification of documents is 
provided. The class hierarchy is then used with a dociuuent 
classifier to allow classification of documents within the 
categories. In accordance with the invention, a plurahty of 
category nodes and corresponding terms defining each of the 
plurality of category nodes are stored within a class hierar- 
chy. The class hierarchy is initialized to contain a root 
category node within a tree data stmcture, and the root 
category node is defined by a user-defined category name. 
The class hierarchy is displayed to assist a user in entering 
a command for manipulating the class hierarchy. First, a user 
may select a category command, resulting in the class 
hierarchy containing a plurality of category nodes. 'Vhc 
category command may comprise one of several commands. 
These commands include the capability to add a category 
node, link a first category node to a second category node, 
move a category node, delete a category node, edit a 
category node, or display information defining a category 
node. Second, a user may select a terms command to 
manipulate terms defining one of the plurality of category 
nodes. Therefore, terms can be added to one of the plurality 
of category nodes or viewed according to user-selected 
criteria. 

BRIEF DESCRIP-nON OF THE DRAWINGS 

The following figures are illustrative and not intended to 
hmit the scope of the invention. 

FIG. 1 illustrates a class hierarchy according to one 
embodiment of the present invention. 

FIG. 2 is a flow diagram of the main program loop utilized 
in creation of the class hierarchy according to an embodi- 
ment of the present invention. 
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FIG. 3 is a flow diagram of the process category command 
procedure shown in the main program loop of FIG. 2 
according to one embodiment of the present invention. 

FIG. 4 is a flow diagram of the process terms command 
5 procedm-e shown in the main program loop of FIG. 2 
according to one embodiment of the present invention. 

FIG. 5 illustrates a document directory hierarchy accord- 
ing to one embodiment of the present invention. 

FIG. 6 is a flow diagram of the main procedure utilized in 
creation of the document directory hierarchy according to 
one embodiment of the present invention. 

FIG. 7 is a flow diagram illustrating a method for search- 
ing the document directory hierarchy according to one 
]5 embodiment of the present invendoo. 

FIG. 8 illustrates a method for retrieving matching cat- 
egory names corresponding to directory paths shown in FIG. 
7 according to one embodiment of the present invention. 

FIG. 9 illustrates a block diagram of an embodiment of a 
^0 computer system implementing the present invention. 

DETAILED DESCRIPTION OF THE 
EMBODIMEN'I^ OF THE INVENTION 

25 Those of ordinary skill in the art will realize that the 
following description of the embodiments of the present 
invention is illustrative only and is not intended to be in any 
way Umiting. Other embodiments of the invention will 
readily suggest themselves to such skilled persons from an 

3Q examination of the within disclosure. 

The embodiments of the present invention provide for 
automatic document classification within user-defined cat- 
egories. A user can then interactively search for documents 
according to search terms within the user-defined categories. 

35 Documents are ranked according to relevance, and a user 
specified number of documents which are most relevant are 
returned. According to one embodiment, the present inven- 
tion is made available to multiple users via a network. 
While the embodiments of the present invention are of 

40 broad applicability and can be used in a variety of contexts, 
an embodiment of the present invention is designed specifi- 
cally to interface with Fulcrum™, an information retrieval 
system designed for use by application developers. Ful- 
crum™ is available from Fulcrum Technologies™, Inc., 

45 located at 785 Carling Ave, Ottawa, Canada K1S5H4, (613) 
238-1761. The functions provided by Fulcrum™ include 
text search, indexing and retrieval capabilities. Indexing 
creates a summary of all documents within a desired direc- 
tory and subdirectories. When a directory is indexed, Ful- 

50 crum™ creates an index comprising a document vector for 
each document within the selected directory. To create each 
document vector, each document is split into termis and a 
weight is associated with each of the terms. The term 
weights are based upon frequency of occurrence of each 

55 term within the document. The weights, therefore, are higher 
when a term exists multiple instances within the document. 
Method for Creating a Class Hierarchy 

Referring now to FIG. 1, a class hierarchy 10 according 
to one embodiment of the present invention is shown. The 

60 class hierarchy 10 comprises at least one level of categories. 
These categories may further comprise sub-categories. The 
categories are stored in category nodes 12 within a tree data 
stmcture. According to one embodiment of the present 
invention, each of the category nodes 12 corresponds to a 

65 class hierarchy directory 14 equivalent to a category NodelD 
16. Each category not fiurther comprising sub-categories will 
herein be referred to as a leaf node, or leaf category. Each 
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leaf category comprises a category definition 18 defining the Several category commands are available to create the 

leaf category. According to one embodiment of the present class hierarchy. These commands include the capability to 

invention, the category definition 18 comprises two groups add a category node, link a first category node to a second 

of data. The first group of data contains descriptive terms category node, move a category node, delete a category 

defining the corresponding leaf category. The second group 5 node, edit a category node, or display information defining 

of data contains portions of documents which have been a category node. Manipulating nodes in a tree data structure 

classified by a user as being relevant to the leaf category, is known in the art of software development. 

Each category further comprising subcategories is defined At step 24, a user-selected command is entered. If the 

by the terms corresponding lo all subcategories within the user-selected command is determined to be a category 

category. The class hierarchy 10 is stored in a class hierarchy lo command at step 26, the appropriate category command is 

database. processed at step 28, and the routine returns to step 22. If it 

According to one embodiment of the present invention, is determined that the user-selected command is not a 

the category node 12 includes the following fields: category command at step 26, it is next determined whether 

Category name — ^The category name is used for display the user-selected command is a terms command at step 30. 

purposes. 15 If the user-selected command is a terms command, the 

Nodetype — There are three possible node types: appropriate terms command is processed at step 32, and the 

"normal", "see*' and "see also". "See also" indicates that a loop is repeated at step 22. The loop is repeated at step 22 

category node can be added at this level, but lists alternate until the user chooses lo exit the main program loop at step 

locations in which it may also be placed. "See" indicates that 34. Alternatively, the steps 26-32 may be ordered in various 

a category node cannot be added at this level, and gives 20 ways to achieve the same result, 

alternate categories in which to place the new category. Referring now to FIG. 3, a flow diagram of the process 

"Normal" indicates that the node is either a branch or a leaf category command procedure 28 shown in FIG, 2 according 

node. The nodetype is set at time of creation of the node. to one embodiment of the present invention is illustrated. If 

NodelD — ^When a node is created, an integer NodelD is the user-selected command is determined to be an add 

associated with the node. According to one embodiment of 25 category command at step 36, the add category command is 

the present invention, each directory within the class hier- performed at step 38. This command is used to add a new 

archy comprises a directory name equivalent to the NodelD category by name, adding a child node to an existing 

of the corresponding category. The NodelD is used rather category. Add is not available if the node is a leaf node or 

than the category name, since each category name may the node contains a "see" link. Add is available if a node 

contain characters that would result in an invalid directory 30 contains only "see also" sub-nodes. 'Yhc first category, or 

name. node, comprises a root directory, or 0 node. When a new 

ParentlD — ^"ITie ParentID field contains an integer indi- category is added, a new category node is created containing 

eating the NodelD of the parent node, a user-defined category name, a nodetype, a unique NodelD 

LinkID — The LinklD field is used for "see" and "see corresponding to the user-defined category name, a ParentID 

also" node types. The LinkID contains an integer indicating 35 corresponding to a NodelD of a parent category node, a 

a NodelD of a desired reference node. LinkID, and a UserlD. This category node is then stored 

Entered by — The entered by field comprises a user ID within the parent category node, 

entered by the user. This field provides a means for tracking If the user-selected command is not an add category 

system updates and creation of system errors, command, at step 40 it is determined if the user-selected 

One of ordinary skill in the art will recognize that the 40 command is a link category command. If the user-selected 

category node may comprise fewer or additional fields. command is a link category command, the link category 

Referring now to FIG. 2, a flow diagram of the main command is performed at step 42. The hnk category com- 

program loop utilized in creation of the class hierarchy mand allows a category to refer to another category using 

according to one embodiment of the present invention is "see" or "see also". "See also" indicates that one or more 

shown. At step 20, initialization of the class hierarchy is 45 other categories contain related information. "See" indicates 

performed. Initially, the class hierarchy comprises a root that this category name cannot contain sub-categories. Link 

category, or root node. In addition, an initial set of descrip- is not available at the root node. Fiu-thermore, the link 

tive terms, including document portions, are provided to a command is not available if a node already contains a "see" 

user. The set of initial terms and document portions are link. 

generated from various sources (i.e., keyword tables, search 50 If the user-selected command is not a hnk-category 

log files). command, at step 44 it is determined if the user-selected 

The class hierarchy is displayed at step 22. To display the command is a move category command. If the user-selected 

updated class hierarchy, the class hierarchy is traversed. command is a move category command, the move category 

First, the root is found by finding the node having no parent. command is performed at step 46. The move category 

The NodelD of the root is then obtained. Second, it is 55 command allows a category to be moved to another location 

determined whose parent is the NodelD of the root. The within the class hierarchy. 

second step is iteratively performed with the NodelD of the If the user-selected command is not a move-category 

current node until the LinkID indicates that the node con- command, at step 48 it is determined if the user-selected 

tains no children. Since categories and terras cannot be command is a delete category command. If the user-selected 

added to a node unless a node contains only "see also" 60 command is a delete category command, the delete category 

subnodes, this is indicated in the display. For example, if a command is performed at step 50. The delete category 

node contains only "see also" subnodes, this node will be command deletes a category and any subtree. This command 

displayed to the user. However, if a node contains a subnode is not available at the root level. 

having a "normal" or "see" nodetype, the node will not be If the user-selected command is not a delete category 

displayed or this limitation will otherwise be indicated to the 65 command, at step 52 it is determined if the user-seleaed 

user. Alternatively, the class hierarchy may be displayed at command is an edit category command. If the user-selected 

a later step. command is determined to be an edit category command, the 



07/16/2003, EAST Version: 1.04.0000 



6,055,540 

5 6 

edit category command is performed at step 54. The edit Once the class hierarchy has been created, documents 

category command allows a category to be renamed. may be classified within the class hierarchy. According to 

If the user-selected command is not an edit category the embodiments of the present invention, a document is 

command, at step 56 it is determined if the user-selected classified based on content within the categories within the 

command is a display category command. If the user- 5 class hierarchy. All documents entered into the system are 

selected command is determined to be a display category classified within one or more categories. The disclosed 

command, the display category command is performed at method for classificauon results in correct placement of 

step 58. 'Iliis command displays node information cone- documents by the classifier approximately 80% of the time, 

spondmg to a parUcular categor>^ According to one embodi- Referring now to FIG. 5, a document directory hierarchy 

mem of the present mvention ParenaO and LmklD mfor- embodimem of the present invention is 

mauon are not displayed. If the user-selected command IS . j * j- . i_- l ^t> 

J- 1 « ^. I «u ♦ shown. The document directory hierarchy 68 comprises a 

not a display category command, the process category . - , • . r.t_ _.- . • 

command routine is completed and the program retur^ to P^^^^^^^y document directories 70 Each of the directories 

the main loop of FIG. 2. One of ordinary skill in the art wiU ^0 corresponds to a category withm the class hierarchy, 

recognize that steps 36-58 may performed in an alternate Accordmg to one embodunent of the present mvention, each 

Qf^der. document directories 70 is equivalent to a category 

According to one embodiment of the present invenUon, NodelD 72 of the corresponding category. Once a document 

the category definition for each leaf category in the class 74 is classified within the class hierarchy, the document 74 

hierarchy is stored in a terms database. The terms database is stored within the document directory 68 as shown, 

comprises descriptive terms and portions of documents Referring now to FIG. 6, a flow diagram of the main 

defining all leaf categories. Each of the descriptive terms 20 procedure utilized in creation of the document directory 

include a reference corresponding to at least one leaf cat- hierarchy according to one embodiment of the present 

egory. Similarly, each of the portions of documents include invention is shown. Initially, at step 76, a terms file is created 

a reference corresponding to at least one leaf category, for each category within the class hierarchy. A category 

document name, type of document, fields, or portions, definition for each leaf node is extracted from the terms 

included within the document (i.e., synopsis), and indexing 25 database and stored in a terms file within the category 

information indicating which fields, or portions, are to be directory in the class hierarchy. Therefore, each terms file 

extracted during indexing. Therefore, a category definition wUl contain all terms and portions of documents defining the 

may be defined by different portions of documents depend- particular category. 

ing upon the type of each document. Next, at step 78, a path-to-name listing containing a 
Referring now to FIG. 4, a flow diagram of the process 30 directory path to category name translation for each direc- 
terms command procedure 32 shown in FIG. 2 according to tory within the class hierarchy is created and stored in a 
one embodiment of the present invention is presented. If the translation file. The categories are extracted from the class 
user-selected command is determined at step 60 to be an add hierarchy database by traversing the class hierarchy from the 
terms command, the add terms command is performed at root, finding all the children successively until a leaf node is 
step 62. The "add terms" command allows terms or portions 35 hit. This process is repeated for all children. Each directory 
of documents to be added to the category definition of a within the directory hierarchy comprises a directory name 
particular category. According to one embodiment of the equivalent to the NodelD of the corresponding category, 
present invention, the category must comprise a leaf cat- Next, at step 80, indexing via Fulcrum™ is performed, 
egory. A user can add a term from the initial set of terms, or Since each leaf node initially contains only one term 
add additional user-defined terms or document portions to an 40 document, or term file, indexing is performed on the class 
appropriate category. Multiple terms separated by commas hierarchy which contains the terms files. Fulcmm extracts 
can be entered. A confirmation dialog is presented before a the terms from each term file and weights each term accord- 
term is added or deleted. Terms of a branch node may be ing to frequency of occurrence. Traversing subdirectories is 
edited if the only type of sub-nodes are "see also" links. This standard for the Fulcrum™ search engine. Fulcrum™ is 
command provides a user with the ability to weight the 45 used to index the 0 node, or root directory, which results in 
descriptive terms which define a category. Terms, as well as indexing of each subdirectory within the 0 node, creating a 
each hand-classified document portion, can be given differ- zero node index. Then, for each term within all term files, 
ent weights. This is performed by creating muhiple copies of Fulcrum™ creates a term vector for each of the most 
a selected term or hand-classified dociunent portion within common terms of the document, and corresponding posi- 
the terms database. 50 tioning information. Indexing creates an index file which 

If the user-selected command is not an add terms contains all term file document vectors, 

command, it is next determined if the user-selected com- Next, a document vector is created at step 82 for a 

mand is a view terms command at step 64. If the user- document to be classified. The document vector is created 

selected command is a view terms command, the view terms via Fulcmm to allow classification through comparison with 

command is performed at step 66. The "view terms" com- 55 document vectors created in step 80. 

mand allows a user to view the category definition of a Next, at step 84, a text search is performed within the 

particular category. This command is available only at the ^ class hierarchy to classify a document. The document to be 

root node. This command is used to display aU terms classi fied is compared against all leaf node data to determine 

according to selection criteria. For example, a user may the appropriat e catcgqiyplacement for the docum ent. This is 

display the terms entered by anyone or display the terms 60 performed oy companog document vectors. The documen t 

entered by a user. If the user-selected command is not a view vector created in ste p 82 is compared via FuTcgim™ to the 

terms command, the process terms command procedure is temTfilc docu ment vectors created during t he inde xing step 

complete and the program returns to the main loop of FIG. 8 0 to determine appro ianitC-Catcgpry placementTAs a result, 

2. One of ordinary skill in the art will recognize that steps Fulcrum™ returns a relevance ranking. A lop percentage of 

60-66 may performed in an alternate order. 65 the rankings are utilized to determine appropriate categories 

Method for Classifying a Document within the Qass Hier- for the document. This relevance percentage is configurable 

archy within the system. The document matches one or more 
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categories if it meets (he user-defined criteria, or configured searctU ofm&>ar e compa red to each of the relevant docu ment 

relevance percentage. A result of the search is a list of vect og^teatcd'byTlir go^^ indexing of the doc ument 

matching category names. direct ory hierarc hy. Since the search is directed, the relevant 

At step 86, if the document docs not match any of the document vectors are the document vectors within the index 

categories, the system administrator is notified at step 88. If, 5 corresponding to the desired category, 

however, the document meets the user-defined criteria, a Atslcpl08, a list of document directory paths is obtained 

matching directory path within the class hierarchy is through Fulcrum™. Each of the document directory paths 

obtained at step 90 for one of the matching category names includes a matching document n ame and a directory paih 

utilizing the path-to-name translation file. corresponding to a matchi ng document . The relevant docu- 

Next, any necessary directories corresponding to the lO mentsare ranked accoramg lo relevance using a statistical 

matching directory path are created within the document ranking provided by Fulcrum™. According to one embodi- 

directory hierarchy at step 92. llierefore, the directory ment of the present invention, the user-specified number of 

structure within the class hierarchy will not necessarily be document directory paths which are most relevant are 

equivalent lo that of the document directory hierarchy. As a selected. Then, matching category names are obtained at 

result, only directories containing documents will be created 15 step 110. 

within the document directory hierarchy. Alternatively, a According to the directed search, the documents are 

directory structure equivalent to that of Uie class hierarchy grnn^^ttgjt^in the matching category names at step_ LL2. 

may be created in a prior step. Next, information corresponding to each document is 

Next, at step 94, the document is added to a leaf directory pla yed by catc^ory.at step 11_4. According to one embodi- 

within the document directory hierarchy corresponding to 20 ment of the present invention, the document information 

the matching directory path. According to one embodiment includes a synopsis and document link. Upon completion of 

of the present invention, the document is symboUcally the directed search, a user may choose to quit at step 116. If 

Unked via Unix to the directory corresponding to the match- the user does not choose to quit, the user enters another user 

ing category. query at step 102 and the loop is repeated. — 

At step 96, it is determined whether there are more 25 If at step 104, if the user query does not include a user 

matching directory paths to which the document must be selected category, an undirected search is performed. At step 

linked. If there are more matching directory paths, a match- 118, the se arch terms are compared to each of the relevan t 

ing directory path for the next matching category is retrieved doc umcnt ^j ^ctors cr eated Hy the-doGuaacaLin dexing of the 

at step 90, and the loop is repeated. do cument director^ hipraff^^y Since the search is 

Once the document has been linked within the document 30 undirected, tne relevant document vectors are the document 
directory hierarchy, more documents may be classified by a vectors within the zero node index, 
user at step 98. When no more categories meet the user- At step 120, a list of document directory paths is obtained 
defined criteria, more documents naay be classified by a user. through Fulcrum. Each of the document directory paths 
When all documents have been classified, each directory includes ajnatching document name and a di rec tory path 
within the document directory hierarchy is indexed at step 35 cnrrespQndjn g ^ ma^g^ inp, d ocument. T he relevant docu- 
100. When a branch node is indexed, all documents in menls are ranked according to relevance using a statistical 
sub-nodes of that node are indexed. Thus, only indexes ranking provided by Fulcrum™. Then, matching category 
corresponding to modified directories and the parent nodes names are obtained at step 110. 
of the modified directories will be updated. Alternatively, According to the undirected search, all relevant category 
only modified directories and any parent nodes of the 40 names obtained in step 110 are sorted by relevance at step 
modified directories may be indexed. If there are more 122. Duplicate category names are removed from each of the 
documents to classify at step 98, a document vector is sorted relevant category names, and the unique sorted rel- 
created at step 82 and the loop is repeated. evant category names are displayed at step 124. Upon 
Method for Searching the Document Directory Hierarchy completion of the undirected search, a user may choose to 
jL^2. Referring now to FIG. 7, a flow diagram illustrating a 45 quit at step 116, If the user does not choose to quit, the user 
method for searching die document directory hierarchy enters another user query at step 102 and the loop is 
according to one embodiment of the present invention is repeated. Therefore, the user may select appropriate 
shown. According to one embodiment of the present categories, alter the search terms, and re-run the search, 
invention, two search methods are provided for searching Referring now to FIG. 8, the method for retrieving 
thedocumentdirectory hierarchy in response to a user query. 50 matching category names HO corresponding to directory 
According to a first method, an undirected search, a user paths shown in FIG. 7 according to one embodiment of the 
query may c omprise one or more search terms. In addition, present invention is presented. At step 126, a matching 
after search results are obtained.- the user c aiL modify the document directory path is obtained from the matching 
o riginal search terms to further limii ih cjcarchLAccording document directory paths. Next, at step 128, a document 
to'^a"second metbod, a ciu^ected search, a user query may 55 name is removed from the document directory path to obtain 
com prise one or more search terms 'and, a selected categ ory a directory path. Next, at step 130, a corresponding category 
to pro vide a more limited search. Once results are obtained . name is obtained by performing a search, or look up, for the 
th e user can then select one,or more_categories or modify the directory path in the path-to-oame listing to obtain a cat- 
search termstomr^^ limited searc h. According lo one egory name. At step 132, if there are more matching docu- 
embod^iment ot the present invention, the user specifies a 60 ment paths, the loop is repeated at step 126. However, if 
number of documents desired. none of the matching document paths remain, each of the 

At step 102, a user query is obtained. The user query matching category names have been retrieved, 

comprises a number of documents desired and one or more Referring now to FIG. 9, a block diagram of an embodi- 

search terms. In addition, the user query may include a user ment of a computer system 134 implementing the present 

selected category. 65 invention is shown. According to this embodiment, the 

Next, at step 104, if the user query includes a user selected *7 present invention is stored in a main memory 136 or a 

category, a directed search is performed. At step 106, the I secondary memory 138 of the computer system 134 for use 
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by a processor 140. The oompuler system 134 may be 
connected to a computer network 142 through transmission 
lines 144. Those of ordinary skill in the art will readily 
recognize that the present invention may also be used in a 
standalone computer system, which is by definition not part 5 
of a computer network. 
Alternative Embodiments 

According to an alternative embodiment, the search may 
be performed on the index created from the terms files 
within the class hierarchy rather than the index created from 
the documents in the document directory hierarchy. 
However, although this method is more eflBcient than other 
embodiments, ii allows a category which contains no docu- 
ments to be displayed to the user. Since the user may select 
a category which contains no documents, this may be 
confusing to the user. 

According to another alternative embodiment, the search 
may be performed on the index created from the terms files 
within the class hierarchy in addition to the index created 
from the documents. This would be helpful to find related 
categories of information. 20 

Although illustrative embodiments and applications of 
this invention are shown and described herein, many varia- 
tions and modifications are possible which remain within the 
concept, scope, and spirit of the invention, and these varia- 
tions would become clear to those of skill in the art after 25 
perusal of this application. The invention, therefore, is not to 
be limited except in the spirit of the appended claims in light 
of their fiill scope of equivalents. 

What is claimed is: 

1. A method for creating a class hierarchy for categori- 
zation of documents within a memory, the class hierarchy 
for use with a document classification system capable of 
classifying a document based on content within the class 
hierarchy, the method comprising: 

initializing the class hierarchy, the class hierarchy having 
a root category node within a tree data structure, the 
root category node having a user-defined category 
name; 

displaying the class hierarchy; 

accepting a user-selected command for manipulating the 40 
class hierarchy; 

processing a category command in response to the user- 
selected conunand having a first predefined state, caus- 
ing the class hierarchy to contain a plurality of category 
nodes, said processing the category command further 45 
comprising: 

storing a category name in one of the plurality of 
category nodes, wherein each of the plurality of 
category nodes corresponds to a unique directory; 
storing a NodelD within one of the plurality of category 50 

nodes, the NodelD defining the unique directory; 
storing a nodetype within one of the plurality of cat- 
egory nodes, the nodetype when having a predefined 
type allowing a new category node to be added to a 
selected one of the plurality of category nodes, and 55 
otherwise preventing the new category node from 
being added to the selected one of the plurality of 
category nodes; 
storing a ParentlD within one of the plurality of cat- 
egory nodes, the ParentlD indicating a NodelD of a eo 
parent category node; and 
storing a LinklD within a first one of the plurahty of 
category nodes, the LinklD indicating a NodelD of 
a second one of the pluraUty of category nodes when 
the nodetype is of a predefined type; and 55 
processing a terms command in response to the user- 
selected command having a second predefined state. 



the terms command manipulating terms defining one of 
the plurahty of category nodes. 

2. The method according to claim 1, said displaying 
further comprising displaying information corresponding to 
at least one of the plurality of category nodes within the class 
hierarchy, the information indicating a nodetype for the at 
least one of the plurality of category nodes. 

3. The method according to claim 1, wherein said pro- 
cessing a category command comprises; 

optionally adding a first one of the plurality of category 
nodes to a second one of the plurality of category nodes 
when the nodetype of the second one of the plurahty of 
category nodes has a predefined type, the first one of 
the plurality of category nodes being a new category 
node, and the second one of the plurahty of category 
nodes being an existing category node, said optionally 
adding a category further including; 

accepting a user-defined category from an input device; 

storing the user-defined category within the new category 
node; 

storing a nodetype within the new category node; 

storing a unique NodelD corresponding to the user- 
defined category within the new category node; 

storing a ParentlD corresponding to a NodelD of the 
existing category node within the new category node; 
and 

storing a LinklD within the new category node, the 
LinklD indicating a NodelD of one of the plurality of 
category nodes when the nodetype is of a predefined 
type. 

4. The method according to claim 1, wherein said option- 
ally processing a category command comprises: 

optionally creating a fink from a first one of the plurality 
of category nodes to a second one of the plurality of 
category nodes when the first one of the plurality of 
category nodes is not a root node and when it has a 
nodetype of a predefined type, the LinklD of the first 
one of the plurality of category nodes referring to the 
NodelD of the second one of the plurahty of category 
nodes. 

5. The method according to claim 1, wherein said option- 
ally processing a category command comprises: 

optionally moving one of the plurality of category nodes 
within the class hierarchy, said optionally moving fur- 
ther including altering the ParentlD of the one of the 
plurahty of category nodes. 

6. The method according to claim 1, wherein said option- 
ally processing a category command comprises: 

optionally editing one of the pliu-ality of category nodes 
within the class hierarchy, said optionally editing fur- 
ther including altering the category name of the one of 
the plurality of category nodes. 

7. A computer system for creating a class hierarchy for 
categorization of documents within a memory, the class 
hierarchy for use with a document classification system 
capable of classifying a document based on content within 
the class hierarchy, the computer system comprising: 

a processor; and 

a memory having stored therein the following: 

means for initializing the class hierarchy, the class 
hierarchy having a root category node within a tree 
data structure, the root category node having a user- 
defined category name; 
means for displaying the class hierarchy; 
means for accepting a user-selected command for 
manipulating the class hierarchy; 
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means for processing a category command in response 
to the user selected command having a first pre- 
defined state, causing the class hierarchy to contain 
a plurality of category nodes, said means for pro- 
cessing the category command further comprising: 5 
means for storing a category name in one of the 
plurality of category nodes, wherein each of the 
plurality of category nodes corresponds 10 a 
unique directory; 
means for storing a Node ID within one of the 
plurality of category nodes, the NodelD defining 
the unique directory; 
means for storing a nodetype within one of the 
plurality of category nodes, the nodetype when 
having a predefined type allowing a new category 
node to be added to a selected one of the plurality 
of category nodes, and otherwise preventing the 
new category node from being added to the 
selected one of the plurality of category nodes; 
means for storing a ParentID within one of the 
plurality of category nodes, the ParentID indicat- 20 
ing a NodelD of a parent category node; and 
means for storing a LinkID within a first one of the 
plurality of category nodes, the LinkID indicating 
a NodelD of a second one of the plurality of 
category nodes when the nodetype is of a pre- 25 
defined type; and 
means for processing a terms command in response to the 
user selected command having a second predefined 
state, the terms command manipulating terms defining 
one of the plurality of category nodes. ^° 

8. The computer system according to claim 7, the means 
for displaying further comprising: 

means for displaying information corresponding to at 
least one of the plurality of category nodes within the 
class hierarchy, the information indicating a nodetype 
for the at least one of the plurality of category nodes. 

9. The computer system according to claim 7, wherein the 
means for optionally processing a category command com- 
prises: 

means for optionally adding a first one of the plurality of '^^ 
category nodes to a second one of the plurality of 
category nodes when the nodetype of the second one of 
the plurality of category nodes has a predefined type, 
the first one of the plurality of category nodes being a 
new category node, and the second one of the plurality 
of category nodes being an existing category node, the 
means for optionally adding a category further includ- 
ing 

means for accepting a user-defined category from an input 
device; 

means for storing the user-defined category within the 

new category node; 
means for storing a nodetype within the new category 

node; ^5 
means for storing a unique NodelD corresponding to the 

user-defined category within the new category node; 
means for storing a ParentID corresponding to a NodelD 

of the existing category node within the new category 

node; and 60 
means for storing a LinkID within the new category node, 

the LinkID indicating a NodelD of one of the plurality 

of category nodes when the nodetype is of a predefined 

type. 

10. The computer system according to claim 7, wherein 65 
the means for optionally processing a category command 
comprises: 



means for optionally creating a link from a first one of the 
plurality of category nodes to a second one of the 
plurality of category nodes when the first one of the 
plurality of category nodes is not a root node and when 
it has a nodetype of a predefined type, the LinkID of the 
first one of the plurality of category nodes referring to 
the NodelD of the second one of the plurality of 
category nodes. 
LI. The computer system according to claim 7, wherein 

the means for optionally processing a category command 

comprises: 

means for optionally moving one of the plurality of 
category nodes within the class hierarchy, said option- 
ally moving further including altering the ParentID of 
the one of the plurality of category nodes. 

12. The computer system according to claim 7, wherein 
the means for optionally processing a category command 
comprises: 

means for optionally editing one of the plurality of 
category nodes within the class hierarchy, said option- 
ally editing ftirther including altering the category 
name of the one of the plurality of category nodes. 

13. A computer-readable medium recording software, the 
software disposed on a computer to perform a method for 
creating a class hierarchy for categorization of documents 
within a memory, the class hierarchy for use with a docu- 
ment classification system capable of classifying a document 
based on content within the class hierarchy, the method 
comprising: 

initializing the class hierarchy, the class hierarchy having 
a root category node within a tree data stmcture, the 
root category node having a user-defined category 
name; 

displaying the class hierarchy; 

accepting a user-selected command for manipulating the 
class hierarchy; 

processing a category command in response to the user- 
selected command having a first predefined state, caus- 
ing the class hierarchy to contain a plurahty of category 
nodes, said processing the category command further 
comprising: 

storing a category name in one of the plurality of 
category nodes, wherein each of the plurality of 
category nodes corresponds to a unique directory; 
storing a NodelD within one of the plurality of category 

nodes, the NodelD defining the unique directory; 
storing a nodetype within one of the plurality of cat- 
egory nodes, the nodetype when having a predefined 
type allowing a new category node to be added to a 
selected one of the plurality of category nodes, and 
otherwise preventing the new category node from 
being added to the selected one of the plurality of 
category nodes; 
storing a ParentID within one of the plurahty of cat- 
egory nodes, the ParentID indicating a NodelD of a 
parent category node; and 
storing a LinkID within a first one of the plurality of 
category nodes, the LinkID indicating a NodelD of 
a second one of the plurality of category nodes when 
the nodetype is of a predefined type; and 
processing a terms command in response to the user- 
selected command having a second predefined state, 
the terms command manipulating terms defining one of 
the plurality of category nodes. 

14. The computer-readable medium according to claim 
13, said displaying further comprising: 
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displaying information corresponding lo at least one of 
the plurality of category nodes within the class 
hierarchy, the information indicating a nodetype for the 
at least one of the plurality of category nodes. 

15. The computer-readable medium according to claim 5 
13, wherein said optionally processing a category command 
comprises: 

optionally adding a first one of the plurality of category 
nodes to a second one of the plurality of category nodes 
when the nodetype of the second one of the plurality of ^0 
category nodes has a predefined type, the first one of 
the plurality of category nodes being a new category 
node, and the second one of the plurality of category 
nodes being an existing category node, said optionally 
adding a category further including 

accepting a user-defined category from an input device; 

storing the xiser-defined category within the new category 
node; 

storing a nodetype within the new category node; 

storing a unique NodelD corresponding to the user- 
defined category within the new category node; 

storing a ParentID corresponding lo a NodelD of the 
existing category node within the new category node; 
and 

storing a LinkID within the new category node, the 
LinkID indicating a NodelD of one of the plurality of 
category nodes when the nodetype is of a predefined 
type. 

16. The computer-readable medium according lo claim 
13, wherein said optionally processing a category command 
comprises: 

optionally creating a fink from a first one of the plurality 
of category nodes to a second one of the plurality of 
category nodes when the first one of the plurality of 
category nodes is not a root node and when it has a 
nodetype of a predefined lype, the LinkID of the first 
one of the plurality of category nodes referring to the 
NodelD of the second one of the plurality of category 
nodes. 

17. The computer-readable medium according to claim 
13, wherein said optionally processing a category command 
comprises: 

optionally moving one of the plurality of category nodes ^5 
within the class hierarchy, said optionally moving fur- 
ther including altering the ParentID of ihe one of the 
plurality of category nodes. 

18. The computer-readable mediimi according to claim 
13, wherein said optionally processing a category command 
comprises: 

optionally editing one of the plurality of category nodes 
within the class hierarchy, said optionally editing fur- 
ther including altering the category name of the one of 
the plurality of category nodes. 

19. A computer data sifipal embodied in a carrier wav e 
and representing sequences of instructions wh ich, when 3 
executed by a processor, cause said processor tol:reale a 
class hierarchy for categorization of documents within a 
memory, the class hierarchy for use with a document clas- go 
sification system capable of classifying a document based on 
content within the class hierarchy, by performing the fol- 
lowing: 

initializing the class hierarchy, the class hierarchy having 
a root category node wiihin a tree data structure, the 
root category node having a user-defined category 
name; 
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displaying the class hierarchy; 

accepting a user-selected command for manipulating the 
class hierarchy; 

processing a category command in response to the user- 
selected command having a first predefined state, caus- 
ing the class hierarchy to contain a plurality of category 
nodes, said processing the category command further 
comprising: 

storing a category name in one of the pliu^afity of category 
nodes, wherein each of the plurality of category nodes 
corresponds to a unique directory; 

storing a NodelD within one of the plurality of category 
nodes, the NodelD defining the unique directory; 

storing a nodetype within one of the plurality of category 
nodes, the nodetype when having a predefined type 
allowing a new category node lo be added to a selected 
one of the plurality of category nodes, and otherwise 
preventing the new category node from being added to 
the selected one of the pluraUty of category nodes; 

storing a ParentID within one of the plurafity of category 
nodes, the ParentID indicating a NodelD of a parent 
category node; and 

storing a LinkID within a first one of the plurality of 
category nodes, the LinkID indicating a NodelD of a 
second one of the pluirality of category nodes when the 
nodetype is of a predefined type; and 

processing a terms command in response to the user- 
selected command having a second predefined state, 
the terms command manipulating terms defining one of 
the plurality of category nodes. 

20. The computer data signal according to claim 19, said 
displaying further comprising: 

displaying information corresponding- to at least one of 
the plurality of category nodes within the class 
hierarchy, the information indicating a nodetype for the 
at least one of the pliuality of category nodes. 

21. The computer data signal according to claim 19, 
wherein said optionaUy processing a category command 
comprises: 

optionally adding a first one of the plurality of category 
nodes to a second one of the plurality of category nodes 
when the nodetype of the second one of the plurality of 
category nodes has a predefined type, the first one of 
the plurality of category nodes being a new category 
node, and the second one of the plurality of category 
nodes being an existing category node, said optionally 
adding a category further including 

accepting a user-defined category from an input device; 

storing the user-defined category within the new category 
node; 

storing a nodetype within the new category node; 

storing a unique NodelD corresponding to the user- 
defined category within the new category node; 

storing a ParentID corresponding to a NodelD of the 
existing category node wiihin the new category node; 
and 

storing a LinkID within the new category node, the 
LinkID indicating a NodelD of one of the plurality of 
category nodes when the nodetype is of a predefined 
lype. 

22. The computer data signal according to claim 19, 
wherein said optionally processing a category ■ command 
comprises: 

optionally creating a link from a first one of the plurality 
of category nodes to a second one of the plurality of 
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category nodes when the first one of the plurality of 
category nodes is not a root node and when it has a 
nodetype of a predefined type, the LinklD of the first 
one of the plurality of category nodes referring to the 
NodelD of the second one of the plurality of category 5 
nodes. 

23. The computer data signal according to claim 19, 
wherein said optionally processing a category command 
comprises: 

optionally moving one of the plurality of category nodes 
within the class hierarchy, said optionally moving fur- 



16 

ther including altering the ParentlD of the one of the 

plurality of category nodes. 
24. The computer data signal according to claim 19, 
wherein said optionally processing a category command 
comprises: 

optionally editing one of the plurality of category nodes 
within the class hierarchy, said optionally editing fur- 
ther including altering the category name of the one of 
the plurality of category nodes. 

***** 
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